optimal selection
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The authors propose a two-stage feature selection approach for linear regression. Candidate features are selected in the first stage and the selection is further refined in the second stage. They define "Strong screening consistency" as a criterion for performing the initial stage, and then investigate the implications of that definition when the screening values are computed by a linear operator. From the content and the references it appears that the authors address the statistics community. My expertise is more in machine learning/computer science, and my review is written from that perspective. My main point is the following: The vector beta to be estimated has 0 value for its ith coordinate iff the ith feature should not be part of the optimal selection.
7810ccd41bf26faaa2c4e1f20db70a71-Reviews.html
The authors suggest the use of a criterion, Σ-Optimality, for active learning in Gauss-Markov random fields. The criterion itself was originally proposed by Garnett et al for active surveying, but it does not appear that the submodular property was recognized in that previous work. Labeled and unlabeled are embedded in a graph nodes represent both labeled and unlabled data and edge weights, computed via a kernel, capture similarity. The motivation for an active approach is that acquiring labels on the full data set may incur some cost (presumably greater than computing the edge weights over all data) so a criterion is used to determine which of the remaining unlabeled data should be labeled. The authors establish that the criterion satifies the submodular monotone property and as such greedy selection achieve (1-1/e) performance relative to optimal selection.
Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective
A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.
Group-Sparse Model Selection: Hardness and Relaxations
Baldassarre, Luca, Bhan, Nirav, Cevher, Volkan, Kyrillidis, Anastasios, Satpathi, Siddhartha
Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of "interpretable" signals through the identification of their constituent groups. In this paper, we establish a combinatorial framework for group-model selection problems and highlight the underlying tractability issues. In particular, we show that the group-model selection problem is equivalent to the well-known NP-hard weighted maximum coverage problem (WMC). Leveraging a graph-based understanding of group models, we describe group structures which enable correct model selection in polynomial time via dynamic programming. Furthermore, group structures that lead to totally unimodular constraints have tractable discrete as well as convex relaxations. We also present a generalization of the group-model that allows for within group sparsity, which can be used to model hierarchical sparsity. Finally, we study the Pareto frontier of group-sparse approximations for two tractable models, among which the tree sparsity model, and illustrate selection and computation trade-offs between our framework and the existing convex relaxations.
Targeting Optimal Active Learning via Example Quality
Evans, Lewis P. G., Adams, Niall M., Anagnostopoulos, Christoforos
In many classification problems unlabelled data is abundant and a subset can be chosen for labelling. This defines the context of active learning (AL), where methods systematically select that subset, to improve a classifier by retraining. Given a classification problem, and a classifier trained on a small number of labelled examples, consider the selection of a single further example. This example will be labelled by the oracle and then used to retrain the classifier. This example selection raises a central question: given a fully specified stochastic description of the classification problem, which example is the optimal selection? If optimality is defined in terms of loss, this definition directly produces expected loss reduction (ELR), a central quantity whose maximum yields the optimal example selection. This work presents a new theoretical approach to AL, example quality, which defines optimal AL behaviour in terms of ELR. Once optimal AL behaviour is defined mathematically, reasoning about this abstraction provides insights into AL. In a theoretical context the optimal selection is compared to existing AL methods, showing that heuristics can make sub-optimal selections. Algorithms are constructed to estimate example quality directly. A large-scale experimental study shows these algorithms to be competitive with standard AL methods.